Uniform Indexing and Retrieval Scheme for Chinese, Japanese, and Korean

نویسندگان

  • Da-Wei Juang
  • Yuen-Hsien Tseng
چکیده

This paper reports on our work at the third NTCIR workshop on the subtasks of Chinese, Japanese, and Korean monolingual information retrieval (IR). A Chinese IR system is applied to all document sets in these three languages. Based on the n-gram indexing model and a phrase formulation method to extract longer key terms for indexing, no language-dependent modifications were made to apply the system to Japanese and Korean IR. Our attempt is to see whether such a system originally designed for Chinese IR can still work for Japanese or Korean documents. The results turn out that it performs similarly among the document sets in these three different languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese-Japanese Cross Language Information Retrieval: A Han Character Based Approach

In this paper, we investigate cross language information retrieval (CLIR) for Chinese and Japanese texts utilizing the Han characters common ideographs used in writing Chinese, Japanese and Korean (CJK) languages. The Unicode encoding scheme, which encodes the superset of Han characters, is used as a common encoding platform to deal with the mulfilingual collection in a uniform manner. We discu...

متن کامل

Monolingual Experiments with Far-East Languages in NTCIR-6

This paper describes our third participation in an evaluation campaign involving the Chinese, Japanese and Korean languages (NTCIR-6). Our participation is motivated by three objectives: 1) study the retrieval performances of various probabilistic and language models for these languages; 2) compare the relative retrieval effectiveness of a combined “unigram & bigram” indexing scheme combined wi...

متن کامل

Comparaison des stratégies d'indexation pour les langues asiatiques

In information retrieval, Chinese and Japanese present many challenging problems. Unlike most European languages, the lack of explicit word boundaries represents one of the most important issue for indexing. For this reason, many works proposed different approaches to index documents or requests written in these languages. This article presents a comparison of the common indexing strategies. Mo...

متن کامل

Statistical and Comparative Evaluation of Various Indexing and Search Models

This paper first describes various strategies (character, bigram, automatic segmentation) used to index the Chinese (ZH), Japanese (JA) and Korean (KR) languages. Second, based on the NTCIR-5 testcollections, it evaluates various retrieval models, varying from classical vector-space models to more recent developments in probabilistic and language models. While no clear conclusion was reached fo...

متن کامل

NTCIR-4 Chinese, English, Korean Cross Language Retrieval Experiments Using PIRCS

In NTCIR-4 we participated in Korean, Chinese, English monolingual, Chinese-English, EnglishKorean bilingual, and Chinese-Korean cross language (using English as pivot) retrieval tasks based on our PIRCS retrieval system. The query translation approach was employed for CLIR. We combined two MT translations for Chinese-English, and two for English-Korean. For the latter, a webbased entity-orient...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002